Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 423
Filter
1.
bioRxiv ; 2024 Apr 13.
Article in English | MEDLINE | ID: mdl-38645134

ABSTRACT

Missense variants can have a range of functional impacts depending on factors such as the specific amino acid substitution and location within the gene. To interpret their deleteriousness, studies have sought to identify regions within genes that are specifically intolerant of missense variation 1-12 . Here, we leverage the patterns of rare missense variation in 125,748 individuals in the Genome Aggregation Database (gnomAD) 13 against a null mutational model to identify transcripts that display regional differences in missense constraint. Missense-depleted regions are enriched for ClinVar 14 pathogenic variants, de novo missense variants from individuals with neurodevelopmental disorders (NDDs) 15,16 , and complex trait heritability. Following ClinGen calibration recommendations for the ACMG/AMP guidelines, we establish that regions with less than 20% of their expected missense variation achieve moderate support for pathogenicity. We create a missense deleteriousness metric (MPC) that incorporates regional constraint and outperforms other deleteriousness scores at stratifying case and control de novo missense variation, with a strong enrichment in NDDs. These results provide additional tools to aid in missense variant interpretation.

2.
Eur J Hum Genet ; 2024 Mar 11.
Article in English | MEDLINE | ID: mdl-38467730

ABSTRACT

Intellectual disability (ID) is a common disorder, yet there is a wide spectrum of impairment from mild to profoundly affected individuals. Mild ID is seen as the low extreme of the general distribution of intelligence, while severe ID is often seen as a monogenic disorder caused by rare, pathogenic, highly penetrant variants. To investigate the genetic factors influencing mild and severe ID, we evaluated rare and common variation in the Northern Finland Intellectual Disability cohort (n = 1096 ID patients), a cohort with a high percentage of mild ID (n = 550) and from a population bottleneck enriched in rare, damaging variation. Despite this enrichment, we found only a small percentage of ID was due to recessive Finnish-enriched variants (0.5%). A larger proportion was linked to dominant variation, with a significant burden of rare, damaging variation in both mild and severe ID. This rare variant burden was enriched in more severe ID (p = 2.4e-4), patients without a relative with ID (p = 4.76e-4), and in those with features associated with monogenic disorders. We also found a significant burden of common variants associated with decreased cognitive function, with no difference between mild and more severe ID. When we included common and rare variants in a joint model, the rare and common variants had additive effects in both mild and severe ID. A multimodel inference approach also found that common and rare variants together best explained ID status (ΔAIC = 16.8, ΔBIC = 10.2). Overall, we report evidence for the additivity of rare and common variant burden throughout the spectrum of intellectual disability.

4.
Nat Genet ; 56(3): 377-382, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38182742

ABSTRACT

Gestational diabetes mellitus (GDM) is a common metabolic disorder affecting more than 16 million pregnancies annually worldwide1,2. GDM is related to an increased lifetime risk of type 2 diabetes (T2D)1-3, with over a third of women developing T2D within 15 years of their GDM diagnosis. The diseases are hypothesized to share a genetic predisposition1-7, but few studies have sought to uncover the genetic underpinnings of GDM. Most studies have evaluated the impact of T2D loci only8-10, and the three prior genome-wide association studies of GDM11-13 have identified only five loci, limiting the power to assess to what extent variants or biological pathways are specific to GDM. We conducted the largest genome-wide association study of GDM to date in 12,332 cases and 131,109 parous female controls in the FinnGen study and identified 13 GDM-associated loci, including nine new loci. Genetic features distinct from T2D were identified both at the locus and genomic scale. Our results suggest that the genetics of GDM risk falls into the following two distinct categories: one part conventional T2D polygenic risk and one part predominantly influencing mechanisms disrupted in pregnancy. Loci with GDM-predominant effects map to genes related to islet cells, central glucose homeostasis, steroidogenesis and placental expression.


Subject(s)
Diabetes Mellitus, Type 2 , Diabetes, Gestational , Islets of Langerhans , Pregnancy , Female , Humans , Diabetes Mellitus, Type 2/genetics , Diabetes, Gestational/genetics , Genome-Wide Association Study , Placenta
5.
Nat Genet ; 56(2): 327-335, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38200129

ABSTRACT

Acquiring a sufficiently powered cohort of control samples matched to a case sample can be time-consuming or, in some cases, impossible. Accordingly, an ability to leverage genetic data from control samples that were already collected elsewhere could dramatically improve power in genetic association studies. Sharing of control samples can pose significant challenges, since most human genetic data are subject to strict sharing regulations. Here, using the properties of singular value decomposition and subsampling algorithm, we developed a method allowing selection of the best-matching controls in an external pool of samples compliant with personal data protection and eliminating the need for genotype sharing. We provide access to a library of 39,472 exome sequencing controls at http://dnascore.net enabling association studies for case cohorts lacking control subjects. Using this approach, control sets can be selected from this online library with a prespecified matching accuracy, ensuring well-calibrated association analysis for both rare and common variants.


Subject(s)
Algorithms , Exome , Humans , Exome/genetics , Genotype , Genetic Association Studies , Research
6.
Nat Genet ; 56(1): 152-161, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38057443

ABSTRACT

Recessive diseases arise when both copies of a gene are impacted by a damaging genetic variant. When a patient carries two potentially causal variants in a gene, accurate diagnosis requires determining that these variants occur on different copies of the chromosome (that is, are in trans) rather than on the same copy (that is, in cis). However, current approaches for determining phase, beyond parental testing, are limited in clinical settings. Here we developed a strategy for inferring phase for rare variant pairs within genes, leveraging genotypes observed in the Genome Aggregation Database (v2, n = 125,748 exomes). Our approach estimates phase with 96% accuracy, both in trio data and in patients with Mendelian conditions and presumed causal compound heterozygous variants. We provide a public resource of phasing estimates for coding variants and counts per gene of rare variants in trans that can aid interpretation of rare co-occurring variants in the context of recessive disease.


Subject(s)
Exome , High-Throughput Nucleotide Sequencing , Humans , Exome/genetics , Exome Sequencing , Genotype
7.
Nature ; 625(7993): 92-100, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38057664

ABSTRACT

The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders1-4, but attempts to assess constraint for non-protein-coding regions have proved more difficult. Here we aggregate, process and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD)-the largest public open-access human genome allele frequency reference dataset-and use it to build a genomic constraint map for the whole genome (genomic non-coding constraint of haploinsufficient variation (Gnocchi)). We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation. As expected, the average constraint for protein-coding sequences is stronger than that for non-coding regions. Within the non-coding genome, constrained regions are enriched for known regulatory elements and variants that are implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained protein-coding genes, which in turn suggests that non-coding constraint can aid the identification of constrained genes that are as yet unrecognized by current gene constraint metrics. We demonstrate that this genome-wide constraint map improves the identification and interpretation of functional human genetic variation.


Subject(s)
Genome, Human , Genomics , Models, Genetic , Mutation , Humans , Access to Information , Databases, Genetic , Datasets as Topic , Gene Frequency , Genome, Human/genetics , Mutation/genetics , Selection, Genetic
8.
Nat Genet ; 56(1): 162-169, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38036779

ABSTRACT

Fine-mapping aims to identify causal genetic variants for phenotypes. Bayesian fine-mapping algorithms (for example, SuSiE, FINEMAP, ABF and COJO-ABF) are widely used, but assessing posterior probability calibration remains challenging in real data, where model misspecification probably exists, and true causal variants are unknown. We introduce replication failure rate (RFR), a metric to assess fine-mapping consistency by downsampling. SuSiE, FINEMAP and COJO-ABF show high RFR, indicating potential overconfidence in their output. Simulations reveal that nonsparse genetic architecture can lead to miscalibration, while imputation noise, nonuniform distribution of causal variants and quality control filters have minimal impact. Here we present SuSiE-inf and FINEMAP-inf, fine-mapping methods modeling infinitesimal effects alongside fewer larger causal effects. Our methods show improved calibration, RFR and functional enrichment, competitive recall and computational efficiency. Notably, using our methods' posterior effect sizes substantially increases polygenic risk score accuracy over SuSiE and FINEMAP. Our work improves causal variant identification for complex traits, a fundamental goal of human genetics.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Humans , Bayes Theorem , Multifactorial Inheritance , Algorithms
9.
bioRxiv ; 2024 Feb 28.
Article in English | MEDLINE | ID: mdl-36747613

ABSTRACT

Underrepresented populations are often excluded from genomic studies due in part to a lack of resources supporting their analyses. The 1000 Genomes Project (1kGP) and Human Genome Diversity Project (HGDP), which have recently been sequenced to high coverage, are valuable genomic resources because of the global diversity they capture and their open data sharing policies. Here, we harmonized a high quality set of 4,094 whole genomes from HGDP and 1kGP with data from the Genome Aggregation Database (gnomAD) and identified over 153 million high-quality SNVs, indels, and SVs. We performed a detailed ancestry analysis of this cohort, characterizing population structure and patterns of admixture across populations, analyzing site frequency spectra, and measuring variant counts at global and subcontinental levels. We also demonstrate substantial added value from this dataset compared to the prior versions of the component resources, typically combined via liftover and variant intersection; for example, we catalog millions of new genetic variants, mostly rare, compared to previous releases. In addition to unrestricted individual-level public release, we provide detailed tutorials for conducting many of the most common quality control steps and analyses with these data in a scalable cloud-computing environment and publicly release this new phased joint callset for use as a haplotype resource in phasing and imputation pipelines. This jointly called reference panel will serve as a key resource to support research of diverse ancestry populations.

10.
medRxiv ; 2023 Nov 20.
Article in English | MEDLINE | ID: mdl-38076851

ABSTRACT

Focal segmental glomerulosclerosis (FSGS) is a common cause of nephrotic syndrome with an annual incidence in the United States in African-Americans compared to European-Americans of 24 cases and 5 cases per million, respectively. Among glomerular diseases in Europe and Latin-America, FSGS was the second most frequent diagnosis, and in Asia the fifth. We expand previous efforts in understanding genetics of FSGS by performing a case-control study involving ethnically-diverse groups FSGS cases (726) and a pool of controls (13,994), using panel sequencing of approximately 2,500 podocyte-expressed genes. Through rare variant association tests, we replicated known risk genes - KANK1, COL4A4, and APOL1. A novel significant association was observed for the gene encoding complement receptor 1 (CR1). High-risk rare variants in CR1 in the European-American cohort were commonly observed in Latin- and African-Americans. Therefore, a combined rare and common variant analysis was used to replicate the CR1 association in non-European populations. The CR1 risk variant, rs17047661, gives rise to the Sl1/Sl2 (R1601G) allele that was previously associated with protection against cerebral malaria. Pleiotropic effects of rs17047661 may explain the difference in allele frequencies across continental ancestries and suggest a possible role for genetically-driven alterations of adaptive immunity in the pathogenesis of FSGS.

11.
medRxiv ; 2023 Nov 27.
Article in English | MEDLINE | ID: mdl-38076931

ABSTRACT

A diagnosis of epilepsy has significant consequences for an individual but is often challenging in clinical practice. Novel biomarkers are thus greatly needed. Here, we investigated how common genetic factors (epilepsy polygenic risk scores, [PRSs]) influence epilepsy risk in detailed longitudinal electronic health records (EHRs) of > 360k Finns spanning up to 50 years of individuals' lifetimes. Individuals with a high genetic generalized epilepsy PRS (PRSGGE) in FinnGen had an increased risk for genetic generalized epilepsy (GGE) (hazard ratio [HR] 1.55 per PRSGGE standard deviation [SD]) across their lifetime and after unspecified seizure events. Effect sizes of epilepsy PRSs were comparable to effect sizes in clinically curated data supporting our EHR-derived epilepsy diagnoses. Within 10 years after an unspecified seizure, the GGE rate was 37% when PRSGGE > 2 SD compared to 5.6% when PRSGGE < -2 SD. The effect of PRSGGE was even larger on GGE subtypes of idiopathic generalized epilepsy (IGE) (HR 2.1 per SD PRSGGE). We further report significantly larger effects of PRSGGE on epilepsy in females and in younger age groups. Analogously, we found significant but more modest focal epilepsy PRS burden associated with non-acquired focal epilepsy (NAFE). We found PRSGGE specifically associated with GGE in comparison with >2000 independent diseases while PRSNAFE was also associated with other diseases than NAFE such as back pain. Here, we show that epilepsy specific PRSs have good discriminative ability after a first seizure event i.e. in circumstances where the prior probability of epilepsy is high outlining a potential to serve as biomarkers for an epilepsy diagnosis.

12.
Cell Genom ; 3(12): 100436, 2023 Dec 13.
Article in English | MEDLINE | ID: mdl-38116116

ABSTRACT

Genome-wide association studies (GWASs) have identified tens of thousands of genetic loci associated with human complex traits. However, the majority of GWASs were conducted in individuals of European ancestries. Failure to capture global genetic diversity has limited genomic discovery and has impeded equitable delivery of genomic knowledge to diverse populations. Here we report findings from 102,900 individuals across 36 human quantitative traits in the Taiwan Biobank (TWB), a major biobank effort that broadens the population diversity of genetic studies in East Asia. We identified 968 novel genetic loci, pinpointed novel causal variants through statistical fine-mapping, compared the genetic architecture across TWB, Biobank Japan, and UK Biobank, and evaluated the utility of cross-phenotype, cross-population polygenic risk scores in disease risk prediction. These results demonstrated the potential to advance discovery through diversifying GWAS populations and provided insights into the common genetic basis of human complex traits in East Asia.

13.
medRxiv ; 2023 Oct 25.
Article in English | MEDLINE | ID: mdl-37961173

ABSTRACT

Mass General Brigham, an integrated healthcare system based in the Greater Boston area of Massachusetts, annually serves 1.5 million patients. We established the Mass General Brigham Biobank (MGBB), encompassing 142,238 participants, to unravel the intricate relationships among genomic profiles, environmental context, and disease manifestations within clinical practice. In this study, we highlight the impact of ancestral diversity in the MGBB by employing population genetics, geospatial assessment, and association analyses of rare and common genetic variants. The population structures captured by the genetics mirror the sequential immigration to the Greater Boston area throughout American history, highlighting communities tied to shared genetic and environmental factors. Our investigation underscores the potency of unbiased, large-scale analyses in a healthcare-affiliated biobank, elucidating the dynamic interplay across genetics, immigration, structural geospatial factors, and health outcomes in one of the earliest American sites of European colonization.

14.
Am J Hum Genet ; 110(12): 2068-2076, 2023 Dec 07.
Article in English | MEDLINE | ID: mdl-38000370

ABSTRACT

DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and -exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets.


Subject(s)
DNA , Trout , Humans , Animals , Sequence Analysis, DNA/methods , Genotype , Homozygote , High-Throughput Nucleotide Sequencing/methods , Software
15.
Invest Ophthalmol Vis Sci ; 64(14): 33, 2023 Nov 01.
Article in English | MEDLINE | ID: mdl-37988105

ABSTRACT

Purpose: Apolipoprotein E4 (APOE4), a known risk factor for Alzheimer's disease, has controversially been associated with reduced risk of primary open-angle glaucoma (POAG) and age-related macular degeneration (AMD). Here, we sought to systematically quantify the associations of APOE haplotypes with age-related ocular diseases and to assess their scope and age-dependency. Methods: We included genetic and registry data from 412,171 Finnish individuals in the FinnGen study. Disease endpoints were defined using nationwide registries. APOE genotypes were directly genotyped using Illumina and Affymetrix arrays or imputed using a custom Finnish reference panel. We evaluated the disease associations of APOE genotypes containing ε2 (without ε4) and ε4 (without ε2) compared with the ε3ε3 genotype using logistic regressions stratified by age. Results: APOE ε4 enriched haplotypes were inversely associated with overall glaucoma (odds ratio [OR] = 0.95, 95% confidence interval [CI] = 0.92-0.99, P = 0.0047), and its subtypes POAG (OR = 0.95, P = 0.027), normal-tension glaucoma (OR = 0.87, P = 0.0058), and suspected glaucoma (OR = 0.95, P = 0.014). Individuals with the ε4 allele also had lower odds for AMD (OR = 0.80, 95% CI = 0.76-0.84, P < 0.001), seen both in dry and neovascular subgroups. A slight negative association was also detected in senile cataract, but this was not reproducible in age-group analyses. Conclusions: Our results support prior evidence of the inverse association of APOE ε4 with glaucoma, but the association was weaker than for AMD. We could not show an association with exfoliation glaucoma, supporting the hypothesis that APOE may be involved in regulating retinal ganglion cell degeneration rather than intraocular pressure.


Subject(s)
Apolipoprotein E4 , Glaucoma, Open-Angle , Glaucoma , Macular Degeneration , Humans , Apolipoprotein E4/genetics , Eye , Glaucoma/genetics , Glaucoma, Open-Angle/genetics , Haplotypes , Macular Degeneration/genetics
16.
Nat Genet ; 55(12): 2255-2268, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38036787

ABSTRACT

The human leukocyte antigen (HLA) locus plays a critical role in complex traits spanning autoimmune and infectious diseases, transplantation and cancer. While coding variation in HLA genes has been extensively documented, regulatory genetic variation modulating HLA expression levels has not been comprehensively investigated. Here we mapped expression quantitative trait loci (eQTLs) for classical HLA genes across 1,073 individuals and 1,131,414 single cells from three tissues. To mitigate technical confounding, we developed scHLApers, a pipeline to accurately quantify single-cell HLA expression using personalized reference genomes. We identified cell-type-specific cis-eQTLs for every classical HLA gene. Modeling eQTLs at single-cell resolution revealed that many eQTL effects are dynamic across cell states even within a cell type. HLA-DQ genes exhibit particularly cell-state-dependent effects within myeloid, B and T cells. For example, a T cell HLA-DQA1 eQTL ( rs3104371 ) is strongest in cytotoxic cells. Dynamic HLA regulation may underlie important interindividual variability in immune responses.


Subject(s)
Gene Expression Regulation , Quantitative Trait Loci , Humans , Gene Expression Regulation/genetics , Quantitative Trait Loci/genetics , Genome-Wide Association Study , Polymorphism, Single Nucleotide
17.
JID Innov ; 3(6): 100217, 2023 Nov.
Article in English | MEDLINE | ID: mdl-38034848

ABSTRACT

Several observational studies have demonstrated a consistent pattern of decreased melanoma risk among patients with vitiligo. More recently, this finding has been supported by a suggested genetic relationship between the two entities, with certain variants significantly associated with an increased risk of melanoma, basal cell carcinoma, and squamous cell carcinoma but a decreased risk of vitiligo. We compared 48 associated variants from a recently published GWAS and identified three variants-located in the TYR, MC1R-DEF8, and RALY-EIF2S2-ASIP-AHCY-ITCH loci- that correlated with an increased risk for melanoma, basal cell carcinoma, and squamous cell carcinoma and a decreased risk for vitiligo. We then used results of skin cancers and vitiligo GWAS to compare the shared genetic properties between these two traits through an unbiased Mendelian randomization analysis. Our results suggest that the inverse genetic relationship between common skin cancers and vitiligo is broader than previously reported owing to the influence of shared genome-wide significant associations.

18.
Sci Transl Med ; 15(719): eadg5252, 2023 10 25.
Article in English | MEDLINE | ID: mdl-37878672

ABSTRACT

Effective tissue repair requires coordinated intercellular communication to sense damage, remodel the tissue, and restore function. Here, we dissected the healing response in the intestinal mucosa by mapping intercellular communication at single-cell resolution and integrating with spatial transcriptomics. We demonstrated that a risk variant for Crohn's disease, hepatocyte growth factor activator (HGFAC) Arg509His (R509H), disrupted a damage-sensing pathway connecting the coagulation cascade to growth factors that drive the differentiation of wound-associated epithelial (WAE) cells and production of a localized retinoic acid (RA) gradient to promote fibroblast-mediated tissue remodeling. Specifically, we showed that HGFAC R509H was activated by thrombin protease activity but exhibited impaired proteolytic activation of the growth factor macrophage-stimulating protein (MSP). In Hgfac R509H mice, reduced MSP activation in response to wounding of the colon resulted in impaired WAE cell induction and delayed healing. Through integration of single-cell transcriptomics and spatial transcriptomics, we demonstrated that WAE cells generated RA in a spatially restricted region of the wound site and that mucosal fibroblasts responded to this signal by producing extracellular matrix and growth factors. We further dissected this WAE cell-fibroblast signaling circuit in vitro using a genetically tractable organoid coculture model. Collectively, these studies exploited a genetic perturbation associated with human disease to disrupt a fundamental biological process and then reconstructed a spatially resolved mechanistic model of tissue healing.


Subject(s)
Crohn Disease , Mice , Humans , Animals , Crohn Disease/genetics , Crohn Disease/metabolism , Signal Transduction , Epithelial Cells/metabolism , Intestinal Mucosa/metabolism , Cell Differentiation
19.
iScience ; 26(10): 108053, 2023 Oct 20.
Article in English | MEDLINE | ID: mdl-37841595

ABSTRACT

Crohn's disease (CD) and ulcerative colitis (UC) are two etiologically related yet distinctive subtypes of the inflammatory bowel diseases (IBD). Differentiating CD from UC can be challenging using conventional clinical approaches in a subset of patients. We designed and evaluated a novel molecular-based prediction model aggregating genetics, serum biomarkers, and tobacco smoking information to assist the diagnosis of CD and UC in over 30,000 samples. A joint model combining genetics, serum biomarkers and smoking explains 46% (42-50%, 95% CI) of phenotypic variation. Despite modest overlaps with serum biomarkers, genetics makes unique contributions to distinguishing IBD subtypes. Smoking status only explains 1% (0-6%, 95% CI) of the phenotypic variance suggesting it may not be an effective biomarker. This study reveals that molecular-based models combining genetics, serum biomarkers, and smoking information could complement current diagnostic strategies and help classify patients based on biologic state rather than imperfect clinical parameters.

20.
Cell Genom ; 3(8): 100345, 2023 Aug 09.
Article in English | MEDLINE | ID: mdl-37601974

ABSTRACT

Stroke is the second leading cause of death and disability worldwide. Stroke prevalence varies by sex and ancestry, possibly due to genetic heterogeneity between subgroups. We performed a genome-wide meta-analysis of 16 biobanks across multiple ancestries to study the genetics of ischemic stroke (60,176 cases, 1,310,725 controls) as part of the Global Biobank Meta-analysis Initiative (GBMI) and further combined the results with previously published MegaStroke. Five novel loci for ischemic stroke (LAMC1, CALCRL, PLSCR1, CDKN1A, and SWAP70) were identified after replication in four additional datasets. One previously reported locus showed significant ancestry heterogeneity (ABO), and one showed significant sex heterogeneity (ALDH2). The ALDH2 association was male specific (males p = 1.67e-24, females p = 0.126) and was additionally observed only in the East Asian ancestry (male) samples. These findings emphasize the need for more diverse datasets with large sample sizes to further understand the genetic predisposition of stroke in different ancestry and sex groups.

SELECTION OF CITATIONS
SEARCH DETAIL
...